In the Claims 



1 . (currently amended) A method for segmenting a video including a plurality of 
pixels into a plurality of video objects, comprising: 

assigning a feature vector to each pixel of the video; 

identifying selected pixels of the video as marker pixels; 

assembling each marker pixel and pixels adjacent to the marker pixel into a 
corresponding a volum e volimf^ if the distance between the feature vector of the 
marker pixel and the feature vector of the adjacait pixels is less than a first 
predetermined threshold; 

assigning a first score and descriptors to each volurre; 

sorting the volumes in a high-to-low order according to the first scores; and 

processing the volumes in the high-to-low order, the processing for each 
volume conq)rising: 

comparing the descriptor of the volume to the descriptor of an adjacent 
volume to determine a second score; 

combining the volimfie with the adjacent volume if the second score passes a 
second threshold to generate a video object in a multi-resolution video object tree; 
and 

repeating the comparing and combining stqjs imtil a single video volume 
representing the video remains. 

2. (original) The method of claim 1 wherein each pixel has spatial (xj) and time (0 
coordinates to indicate a location of the pixel and the volumes in a spatial-temporal 
collocated overlapping scene of the video. 
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3. (original) The method of claim 2 wherein the video includes a plurality of frames 
and fiurtha" con^rising: 

projecting a portion of each video object in a particular frame to intersect the 
projection of the video object in an adjacent frame to provide continuous silhouettes 
of the video object according to the time / coordinates* 

4. (currently amended) The method of 3 further con^jrising: 

applying a spatial-domain 2D median filter 2W to the frames +03 to remove 
mtensity singularities, without disturbing edge formation. 

5. (currently amended) The method of claim 1 further conq)rising: 

partitioning the video into a plurality of identically sized volun^s; and 
selecting the pixel at the center of each volume or e the as the marker pixels. 

6. (original) The method of claim 1 further comprising: 

determining a gradient magnitude VV ^dVldx-k- dV/dy + dV/dt for each 

pixel in the video; 

selecting the pixel with a minimum gradient nmgnitude as the marker pixel; 
removing pixel in a predetermined neighborhood around the marker; and 
repeating the selecting and removing stq>s until no pixd remain. 

7. (original) The method of claim of claim 1 wherein the feature vector is based on 
a color of the pixel. 
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8. (original) The method of claim 1 further comprising: 

merging volumes less than minimum size with an adjacent volumes, 

9. (original) The method of claim 8 wherein the minimum size is less than 0*001 of 
the volume representing the video. 

10. (currently amended) The method of claim 9 further comprising: 

sorting the volumes in an inCTeasing ordo: to size; 

processing the volumes in the increasing order, the processing for each 
volume comprising: 

including each pixel of the volun^ less in a closest volume until all volumes 
less than the minimum size are processed. 

1 1 . (original) The method of claim 1 wherein the descriptors include self 
descriptors of the volume, and mutual descriptors of the volimie and the adjacent 
volun^. 

12. (original) A method for segmenting a video sequence of frames, each frame 
including a plurality of pixels, comprising: 

partitioning all of the pixels of all frames of the video into a plurality of 
volumes according to features of each pixel, the pixels of each volume having 
frame-based spatial coordinates and sequence-based temporal coordinates; 

assigning descriptors to each volume; 

representing each volunK as a video object at a lowest level in a multi- 
resolution video object tree; and 
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itCTatively combining volumes according to the descriptors, and representing 
each combined volume as a video object at intermediate levels of the multi- 
resolution video object tree, imtil all of the combined volumes form (he entire video 
represented as a video object at a highest level of the multi-resolution video object 
tree. 
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